perm filename PROPO9[7,ALS] blob sn#033818 filedate 1973-04-09 generic text, type T, neo UTF8
                                                        April 3 1973

␈α?␈α?␈α?␈α?␈α?␈α∃A Proposal for Speech Understanding Research


        It ␈αβis ␈αβproposed ␈αβthat ␈αβthe ␈αβwork ␈ααon ␈ααspeech ␈ααrecognition ␈ααthat ␈ααis ␈ααnow ␈ααunder ␈ααway ␈ααin ␈ααthe ␈ααA.I.
project ␈α
at ␈α
Stanford ␈α
University ␈α
be ␈α
continued ␈α
and ␈α
extended ␈α
as ␈α	a ␈α	separate ␈α	project ␈α	with
broadened ␈ααaims ␈ααin ␈ααthe ␈ααfield ␈ααof ␈ααspeech ␈ααunderstanding. ␈αα ␈ααThis ␈ααwork ␈α↓gives ␈α↓considerable ␈α↓promise
both ␈α∧of ␈α∧solving ␈α∧some ␈αβof ␈αβthe ␈αβimmediate ␈αβproblems ␈αβthat ␈αβbeset ␈αβspeech ␈αβunderstanding ␈αβresearch
and of providing a basis for future advances.

        It ␈α	is ␈α	further ␈α	proposed ␈αλthat ␈αλthis ␈αλwork ␈αλbe ␈αλmore ␈αλclosely ␈αλtied ␈αλto ␈αλthe ␈αλARPA ␈αλSpeech
Understanding ␈αβResearch ␈αβeffort ␈αβthan ␈αβit ␈αβhas ␈αβbeen ␈αβin ␈αβthe ␈αβpast ␈αβand ␈αβthat ␈αβit ␈ααhave ␈ααas ␈ααits ␈ααexpress
aim ␈αβthe ␈αβstudy ␈αβand ␈αβapplication ␈αβto ␈αβspeech ␈αβrecognition ␈αβof ␈αβa ␈ααmachine ␈ααlearning ␈ααprocess, ␈ααthat ␈ααhas
proved ␈α¬highly ␈α¬successful ␈α¬in ␈α¬another ␈α∧application ␈α∧and ␈α∧that ␈α∧has ␈α∧already ␈α∧been ␈α∧tested ␈α∧out ␈α∧to ␈α∧a
limited ␈ααextent ␈ααin ␈ααspeech ␈α↓recognition. ␈α↓ ␈α↓The ␈α↓machine ␈α↓learning ␈α↓process ␈α↓offers ␈α↓both ␈α↓an ␈α↓automatic
training ␈αεscheme ␈αεand ␈αεthe ␈αεinherent ␈αεability ␈αεof ␈αεthe ␈αεsystem ␈αεto ␈αεadapt ␈α¬to ␈α¬various ␈α¬speakers ␈α¬and
dialects. ␈α	 ␈α	Speech ␈α	recognition ␈α	via ␈α	machine ␈α	learning ␈α	represents ␈α	a ␈α	global ␈α	approach ␈α	to ␈αλthe
speech ␈αβrecognition ␈αβproblem ␈ααand ␈ααcan ␈ααbe ␈ααincorporated ␈ααinto ␈ααa ␈ααwide ␈ααclass ␈ααof ␈ααlimited ␈ααvocabulary
systems.

        Finally ␈α¬we ␈α¬would ␈α¬propose ␈α¬accepting ␈α¬responsibility ␈α¬for ␈α¬keeping ␈α¬other ␈α¬ARPA ␈α∧projects
supplied with operating versions of the best current programs that we have developed.  The
availability ␈α¬of ␈α∧the ␈α∧high ␈α∧quality ␈α∧front ␈α∧end ␈α∧that ␈α∧the ␈α∧signature ␈α∧table ␈α∧approach ␈α∧provides ␈α∧would
enable ␈α¬designers ␈α¬of ␈α¬the ␈α¬various ␈α¬over-all ␈α¬systems ␈α¬to ␈α¬test ␈α¬the ␈α¬relative ␈α∧performance ␈α∧of ␈α∧the
top-down ␈αβportions ␈αβof ␈αβtheir ␈αβsystems ␈αβwithout ␈αβhaving ␈αβto ␈αβmake ␈αβallowances ␈αβfor ␈αβthe ␈ααdeficiencies
of ␈α∧their ␈α∧currently ␈α∧available ␈αβfront ␈αβends. ␈αβ ␈αβIndeed, ␈αβif ␈αβthe ␈αβsignature ␈αβtable ␈αβscheme ␈αβcan ␈αβbe ␈αβmade
simple ␈αβenough ␈αβto ␈αβcompete ␈ααon ␈ααa ␈ααtime ␈ααbasis ␈αα(and ␈ααwe ␈ααbelieve ␈ααthat ␈ααit ␈ααcan) ␈ααthen ␈ααit ␈ααmay ␈ααreplace
the other front end schemes that are currently in favor.

        Stanford ␈ααUniversity ␈ααis ␈ααwell ␈ααsuited ␈ααas ␈ααthe ␈ααsite ␈ααfor ␈α↓such ␈α↓work, ␈α↓having ␈α↓both ␈α↓the ␈α↓facilities
for ␈α
this ␈α
work ␈α	and ␈α	a ␈α	staff ␈α	of ␈α	people ␈α	with ␈α	experience ␈α	and ␈α	interest ␈α	in ␈α	machine ␈α	learning,
phonetic analysis, and digital signal processing.

        Ultimately ␈αβwe ␈αβwould ␈αβlike ␈αβto ␈αβhave ␈αβa ␈αβsystem ␈αβcapable ␈αβof ␈αβunderstanding ␈ααspeech ␈ααfrom ␈ααan
unlimited ␈αβdomain ␈αβof ␈αβdiscourse ␈αβand ␈αβwith ␈αβan ␈αβunknown ␈αβspeaker. ␈αβ ␈αβIt ␈αβseems ␈ααnot ␈ααunreasonable ␈ααto
expect ␈α↓the ␈α↓system ␈α↓to ␈α↓deal ␈α↓with ␈α↓this ␈α↓situation ␈α↓very ␈α↓much as people do when they adapt their
understanding ␈αprocesses ␈αto ␈αthe ␈αspeakers ␈αidiosyncrasies ␈αduring ␈αthe ␈αconversation. ␈α ␈αThe
signature ␈ααtable ␈ααmethod ␈ααgives ␈ααpromise ␈ααof ␈ααcontributing ␈ααtoward ␈ααthe ␈ααsolution ␈ααof ␈ααthis ␈α↓problem ␈α↓as
well as being a possible answer to some of the more immediate problems.

        The ␈α↓initial ␈α↓thrust ␈α↓of ␈α↓the ␈α↓proposed ␈α↓work would be toward the development of adaptive
learning ␈αεtechniques, ␈α¬using ␈α¬the ␈α¬signature ␈α¬table ␈α¬method ␈α¬and ␈α¬some ␈α¬more ␈α¬recent ␈α¬varients ␈α¬and
extentions ␈α∧of ␈α∧this ␈α∧basic ␈α∧procedure. ␈α∧ ␈α∧We ␈α∧have ␈α∧already ␈α∧demonstrated ␈α∧the ␈αβusefulness ␈αβof ␈αβthis
method ␈αβfor ␈αβthe ␈αβinitial ␈αβassignment ␈αβof ␈αβsignificant ␈αβfeatures ␈αβto ␈αβthe ␈αβacoustic ␈αβsignals. ␈αα ␈ααOne ␈ααof ␈ααthe
next ␈α¬steps ␈α¬will ␈α¬be ␈α¬to ␈α¬extend ␈α¬the ␈α¬method ␈α¬to ␈α¬include ␈α¬acoustic-phonetic ␈α¬probabilities ␈α¬in ␈α∧the
decision process.

        Still ␈αβanother ␈αβaspect ␈αβto ␈αβbe ␈αβstudied ␈αβwould ␈ααbe ␈ααthe ␈ααamount ␈ααof ␈ααpreprocessing ␈ααthat ␈ααshould
be ␈ααdone ␈ααand ␈ααthe ␈ααdesired ␈ααbalance ␈ααbetween ␈ααbottom-up ␈ααand ␈ααtop-down ␈ααapproaches. ␈α↓ ␈α↓It ␈α↓is ␈α↓fairly
obvious ␈α∧that ␈α∧decisions ␈α∧of ␈α∧this ␈α∧sort ␈αβshould ␈αβideally ␈αβbe ␈αβmade ␈αβdynamicallly ␈αβdepending ␈αβupon ␈αβthe
familiarity ␈α¬of ␈α¬the ␈α¬system ␈α¬with ␈α¬the ␈α¬domain ␈α¬of ␈α¬discourse ␈α∧and ␈α∧with ␈α∧the ␈α∧characteristics ␈α∧of ␈α∧the
speaker. ␈αλ ␈αλCompromises ␈αλwill ␈αλundoubtedly ␈αλhave ␈απto ␈απbe ␈απmade ␈απin ␈απany ␈απimmediately ␈απrealizable
system ␈ααbut ␈ααwe ␈ααshould ␈ααunderstand ␈ααbetter ␈ααthan ␈ααwe ␈ααnow ␈α↓do ␈α↓the ␈α↓limitations ␈α↓on ␈α↓the ␈α↓system ␈α↓that
such compromises impose.

        It ␈ααmay ␈ααbe ␈ααwell ␈α↓at ␈α↓this ␈α↓point ␈α↓to ␈α↓discribe ␈α↓the ␈α↓general ␈α↓philosophy ␈α↓that ␈α↓has ␈α↓been ␈α↓followed
in ␈α∧the ␈α∧work ␈α∧that ␈α∧is ␈α∧currently ␈α∧under ␈α∧way ␈α∧and ␈α∧the ␈α∧results ␈αβthat ␈αβhave ␈αβbeen ␈αβachieved ␈αβto ␈αβdate.
We ␈α↓have ␈α↓been ␈α↓studying ␈α↓elements ␈α↓of ␈α↓a ␈α↓speech recognition system that is not dependent upon
the ␈α∧use ␈α∧of ␈α∧a ␈α∧limited ␈α∧vocabulary ␈α∧and ␈α∧that ␈α∧can ␈α∧recognize ␈α∧continuous ␈α∧speech ␈α∧by ␈α∧a ␈α∧number ␈αβof
different speakers.

        Such ␈αεa ␈αεsystem ␈αεshould ␈αεbe ␈αεable ␈α¬to ␈α¬function ␈α¬successfully ␈α¬either ␈α¬without ␈α¬any ␈α¬previous
training ␈α¬for ␈α¬the ␈α¬specific ␈α∧speaker ␈α∧in ␈α∧question ␈α∧or ␈α∧after ␈α∧a ␈α∧short ␈α∧training ␈α∧session ␈α∧in ␈α∧which ␈α∧the
speaker ␈α¬would ␈α∧be ␈α∧asked ␈α∧to ␈α∧repeat ␈α∧certain ␈α∧phrases ␈α∧designed ␈α∧to ␈α∧train ␈α∧the ␈α∧system ␈α∧on ␈α∧those
phonetic ␈α↓utterances ␈α↓that ␈α↓seemed ␈α↓to ␈α↓depart ␈α↓from ␈α↓the ␈α↓previously ␈α↓learned norm.  In either case
it ␈ααis ␈ααbelieved ␈ααthat ␈ααsome ␈ααautomatic ␈ααor ␈ααsemi-automatic ␈ααtraining ␈α↓system ␈α↓should ␈α↓be ␈α↓employed ␈α↓to
acquire ␈α↓the ␈α↓data ␈α↓that ␈α↓is ␈α↓used ␈α↓for ␈α↓the ␈α↓identification ␈α↓of ␈α↓the ␈α↓phonetic ␈α↓information ␈α↓in ␈α↓the ␈α↓speech.
We ␈α∧believe ␈α∧that ␈α∧this ␈α∧can ␈α∧best ␈α∧be ␈α∧done ␈α∧by ␈α∧employing ␈α∧a ␈α∧modification ␈α∧of ␈αβthe ␈αβsignature ␈αβtable
scheme ␈ααpreviously ␈ααdiscribed. ␈αα ␈ααA ␈ααbrief ␈ααreview ␈ααof ␈ααthis ␈ααearlier ␈α↓form ␈α↓of ␈α↓signature ␈α↓table ␈α↓is ␈α↓given
in Appendix 1.

        The ␈απover-all ␈απsystem ␈απis ␈απenvisioned ␈απas ␈απone ␈απin ␈απwhich ␈απthe ␈απmore ␈απor ␈απless ␈απconventional
method ␈ααis ␈ααused ␈ααof ␈ααseparating ␈ααthe ␈ααinput ␈ααspeech ␈ααinto ␈ααshort ␈ααtime ␈ααslices ␈α↓for ␈α↓which ␈α↓some ␈α↓sort ␈α↓of
frequency ␈α
analysis, ␈α
homomorphic, ␈α
LPC, ␈α
or ␈αthe ␈αlike, ␈αis ␈αdone. ␈α ␈αWe ␈αthen ␈αinterpret ␈αthis
information ␈α∧in ␈α∧terms ␈α∧of ␈α∧significant ␈α∧features ␈α∧by ␈α∧means ␈α∧of ␈α∧a ␈α∧set ␈αβof ␈αβsignature ␈αβtables. ␈αβ ␈αβAt ␈αβthis
point ␈α↓we ␈α↓define longer sections of the speech called EVENTS which are obtained by grouping
togather ␈α¬varying ␈α¬numbers ␈α¬of ␈α¬the ␈α¬original ␈α¬slices ␈α¬on ␈α¬the ␈α¬basis ␈α¬of ␈α∧their ␈α∧similarity. ␈α∧ ␈α∧This ␈α∧then
takes ␈ααthe ␈ααplace ␈ααof ␈ααother ␈ααforms ␈ααof ␈α↓initial ␈α↓segmentation. ␈α↓ ␈α↓Having ␈α↓identified ␈α↓a ␈α↓series ␈α↓of ␈α↓EVENTS
in ␈αεthis ␈αεway ␈αεwe ␈αεnext ␈αεuse ␈αεanother ␈α¬set ␈α¬of ␈α¬signature ␈α¬tables ␈α¬to ␈α¬extract ␈α¬information ␈α¬from ␈α¬the
sequence ␈αof ␈αevents ␈αand ␈αcombine ␈αit ␈αwith ␈αa ␈αlimited ␈αamount ␈αof ␈αsyntactic ␈αand ␈αsemantic
information to define a sequence of phonemes.

        While ␈αβit ␈αβwould ␈αβbe ␈αβpossible ␈αβto ␈αβextend ␈αβthis ␈αβbottom ␈αβup ␈αβapproach ␈αβstill ␈ααfurther, ␈ααit ␈ααseems
reasonable ␈ααto ␈ααbreak ␈ααoff ␈α↓at ␈α↓this ␈α↓point ␈α↓and ␈α↓revert ␈α↓to ␈α↓a ␈α↓top ␈α↓down ␈α↓approach ␈α↓from ␈α↓here ␈α↓on. ␈α↓ ␈α↓The
real ␈αβdifference ␈αβin ␈αβthe ␈αβoverall ␈αβsystem ␈αβwould ␈αβthen ␈αβbe ␈αβthat ␈αβthe ␈αβtop ␈αβdown ␈ααanalysis ␈ααwould ␈ααdeal
with ␈αεthe ␈αεoutputs ␈αεfrom ␈αεthe ␈αεsignature ␈αεtable ␈αεsection ␈αεas ␈αεits ␈αεprimatives ␈αεrather ␈αεthan ␈αεwith ␈α¬the
outputs ␈ααfrom ␈ααthe ␈ααinitial ␈ααmeasurements ␈ααeither ␈ααin ␈ααthe ␈ααtime ␈ααdomain ␈ααor ␈α↓in ␈α↓the ␈α↓frequency ␈α↓domain.
In ␈αβthe ␈αβcase ␈αβof ␈αβinconsistancies ␈αβthe ␈αβsystem ␈αβcould ␈αβeither ␈αβrefer ␈αβto ␈ααthe ␈ααsecond ␈ααchoices ␈ααretained
within ␈α↓the signature tables or if need be could always go clear back to the input parameters.
The ␈α¬decision ␈α¬as ␈α¬to ␈α¬how ␈α¬far ␈α¬to ␈α¬carry ␈α¬the ␈α¬initial ␈α¬bottom ␈α¬up ␈α¬analysis ␈α¬must ␈α¬depend ␈α∧upon ␈α∧the
relative ␈αβcost ␈αβof ␈αβthis ␈αβanalysis ␈αβboth ␈αβin ␈αβcomplexity ␈αβand ␈αβprocessing ␈ααtime ␈ααand ␈ααthe ␈ααcertainty ␈ααwith
which ␈απit ␈απcan ␈απbe ␈απperformed ␈αεas ␈αεcompaired ␈αεwith ␈αεthe ␈αεcosts ␈αεassociated ␈αεwith ␈αεthe ␈αεrest ␈αεof ␈αεthe
analysis ␈αβand ␈αβthe ␈ααcertainty ␈ααwith ␈ααwhich ␈ααit ␈ααcan ␈ααbe ␈ααperformad, ␈ααtaking ␈ααdue ␈ααnotice ␈ααof ␈ααthe ␈ααcosts ␈ααin
time of recovering from false starts.

        Signature ␈ααtables ␈ααcan ␈ααbe ␈ααused ␈ααto ␈ααperform ␈ααfour ␈ααessential ␈ααfunctions ␈ααthat ␈ααare ␈ααrequired ␈α↓in
the ␈ααautomatic ␈ααrecognition ␈ααof ␈ααspeech. ␈αα ␈ααThese ␈ααfunctions ␈ααare: ␈αα(1) ␈α↓the ␈α↓elimination ␈α↓of ␈α↓superfluous
and ␈αλredundant ␈αλinformation ␈αλfrom ␈αλthe ␈αλacoustic ␈αλinput ␈αλstream, ␈αλ(2) ␈απthe ␈απtransformation ␈απof ␈απthe
remaining ␈α∂information ␈α∞from ␈α∞one ␈α∞coordinate ␈α∞system ␈α∞to ␈α∞a ␈α∞more ␈α∞phonetically ␈α∞meaningful
coordinate ␈α∧system, ␈α∧(3) ␈α∧the ␈α∧mixing ␈α∧of ␈α∧acoustically ␈α∧derived ␈α∧data ␈α∧with ␈α∧syntactic, ␈αβsemantic ␈αβand
linguistic ␈ααinformation ␈ααto ␈ααobtain ␈ααthe ␈ααdesired ␈ααrecognition, ␈ααand ␈αα(4) ␈ααthe ␈ααintroduction ␈ααof ␈α↓a ␈α↓learning
mechanism.

        The following three advantages emerge from this method of training and evaluation.
        1) ␈α
Essentially ␈α
arbitrary ␈α	inter-relationships ␈α	between ␈α	the ␈α	input ␈α	terms ␈α	are ␈α	taken ␈α	in
account by any one table.  The only loss of accuracy is in the quantization.
        2) ␈ααThe ␈α↓training ␈α↓is ␈α↓a ␈α↓very ␈α↓simple ␈α↓process ␈α↓of ␈α↓accumulating ␈α↓counts. ␈α↓ ␈α↓The ␈α↓training ␈α↓samples
are ␈αintroduced ␈αsequentially, ␈αand ␈α
hence ␈α
simultaneous ␈α
storage ␈α
of ␈α
all ␈α
the ␈α
samples ␈α
is ␈α
not
required.
        3) The process linearizes the storage requirements in the parameter space.

        The ␈α∧signature ␈α∧tables, ␈α∧as ␈α∧used ␈α∧in ␈α∧speech ␈α∧recognition, ␈α∧must ␈α∧be ␈α∧particularized ␈α∧to ␈αβallow
for ␈α↓the multi-catagory nature of the output.  Several forms of tables have been investigated.
Details ␈αβof ␈αβthe ␈ααcurrent ␈ααsystem ␈ααare ␈ααgiven ␈ααin ␈ααAppendix ␈αα2. ␈αα ␈ααSome ␈ααresults ␈ααare ␈ααsummarized ␈ααin ␈ααan
attached report.

        Work ␈α∧is ␈α∧currently ␈α∧under ␈αβway ␈αβon ␈αβa ␈αβmajor ␈αβrefinement ␈αβof ␈αβthe ␈αβsignature ␈αβtable ␈αβapproach
which ␈α¬adopts ␈α¬a ␈α¬somewhat ␈α¬more ␈α¬rigorous ␈α¬procedure. ␈α¬ ␈α¬Preliminary ␈α¬results ␈α¬with ␈α¬this ␈α∧scheme
indicate that a substantial improvement has been achieved.

                Appendix 1

        The early form of a signature table

        For ␈α↓those ␈α↓not ␈α↓familiar ␈α↓with ␈α↓the ␈α↓use ␈α↓of ␈α↓signature ␈α↓tables as used by Samuel in programs
which ␈α↓played ␈α↓the ␈α↓game ␈α↓of ␈α↓checkers, ␈α↓the ␈α↓concept is best illustrated (Fig.1) by an arrangement
of ␈αβtables ␈αβused ␈αβin ␈αβthe ␈αβprogram. ␈αβ ␈αβThere ␈αβare ␈αβ27 ␈αβinput ␈αβterms. ␈αβ ␈αβEach ␈αβterm ␈αβevaluates ␈αβa ␈ααspecific
aspect ␈α↓of ␈α↓a ␈α↓board situation and it is quantized into a limited but adequate range of values, 7,
5 ␈αβand ␈αβ3, ␈αβin ␈αβthis ␈αβcase. ␈αβ ␈αβThe ␈αβterms ␈αβare ␈αβdivided ␈αβinto ␈αβ9 ␈ααsets ␈ααwith ␈αα3 ␈ααterms ␈ααeach, ␈ααforming ␈ααthe ␈αα9
first level tables.  Outputs from the first level tables are quantized to 5 levels and combined
into 3 second level tables and, finally, into one

third-level table whose output represents the figure of merit of the board in question.

        A ␈αεsignature ␈α¬table ␈α¬has ␈α¬an ␈α¬entry ␈α¬for ␈α¬every ␈α¬possible ␈α¬combination ␈α¬of ␈α¬the ␈α¬input ␈α¬vector.
Thus ␈αβthere ␈αβare ␈αβ7*5*3 ␈αβor ␈αβ105 ␈αβentries ␈ααin ␈ααeach ␈ααof ␈ααthe ␈ααfirst ␈ααlevel ␈ααtables. ␈αα ␈ααTraining ␈ααconsists ␈ααof
accumulating ␈ααtwo ␈ααcounts ␈ααfor ␈ααeach ␈ααentry ␈ααduring ␈ααa ␈ααtraining ␈ααsequence. ␈αα ␈ααCount ␈α↓A ␈α↓is ␈α↓incremented
when ␈α↓the current input vector represents a prefered move and count D is incremented when
it ␈αλis ␈αλnot ␈αλthe ␈αλprefered ␈αλmove. ␈αλ ␈αλThe ␈αλoutput ␈αλfrom ␈αλthe ␈αλtable ␈αλis ␈αλcomputed ␈αλas ␈αλa ␈απcorrelation
coeficient
␈α↓␈α␈␈α?␈α?␈α?␈α?␈α?␈α?␈α?␈α∂C=(A-D)/(A+D)␈αε.␈αε
        The ␈αβfigure ␈αβof ␈αβmerit ␈αβfor ␈αβa ␈ααboard ␈ααis ␈ααsimply ␈ααthe ␈ααcoefficient ␈ααobtained ␈ααas ␈ααthe ␈ααoutput ␈ααfrom
the final table.

                Appendix 2

        Initial Form of Signature Table for Speech Recognition

        The ␈α∧signature ␈α∧tables, ␈α∧as ␈α∧used ␈α∧in ␈α∧speech ␈α∧recognition, ␈α∧must ␈α∧be ␈α∧particularized ␈α∧to ␈αβallow
for ␈α↓the multi-catagory nature of the output.  Several forms of tables have been investigated.
The ␈α∧initial ␈α∧form ␈α∧tested ␈α∧and ␈αβused ␈αβfor ␈αβthe ␈αβdata ␈αβpresented ␈αβin ␈αβthe ␈αβattached ␈αβpaper ␈αβuses ␈αβtables
consisting ␈ααof ␈ααtwo ␈ααparts, ␈ααa ␈α↓preamble ␈α↓and ␈α↓the ␈α↓table ␈α↓proper. ␈α↓ ␈α↓The ␈α↓preamble ␈α↓contains: ␈α↓(1) ␈α↓space
for ␈αβsaving ␈αβa ␈αβrecord ␈αβof ␈αβthe ␈αβcurrent ␈ααand ␈ααrecent ␈ααoutput ␈ααreports ␈ααfrom ␈ααthe ␈ααtable, ␈αα(2) ␈ααidentifying
information ␈απas ␈απto ␈απthe ␈απspecific ␈απtype ␈απof ␈απtable, ␈απ(3) ␈απa ␈απparameter ␈απthat ␈απidentifies ␈αεthe ␈αεdesired
output ␈αλfrom ␈αλthe ␈αλtable ␈αλand ␈αλthat ␈αλis ␈αλused ␈αλin ␈απthe ␈απlearning ␈απprocess, ␈απ(4) ␈απa ␈απgating ␈απparameter
specifying the input, that is to be used to gate the table, (5) the sign of the gate,
␈α?␈α?␈α?␈α?␈α?␈α?␈α⊃(6) the gating level to be used and (7)
parameters that identify the sources of the normal inputs to the table.

        All ␈αεinputs ␈αεare ␈αεlimited ␈α¬in ␈α¬range ␈α¬and ␈α¬specify ␈α¬either ␈α¬the ␈α¬absolute ␈α¬level ␈α¬of ␈α¬some ␈α¬basic
property ␈α↓or ␈α↓more ␈α↓usually ␈α↓the ␈α↓probability ␈α↓of ␈α↓some ␈α↓property ␈α↓being ␈α↓present.  These inputs may
be ␈α↓from ␈α↓the ␈α↓original ␈α↓acoustic ␈α↓input or they may be the outputs of other tables.  If from other
tables ␈ααthey ␈ααmay ␈ααbe ␈ααfor ␈ααthe ␈ααcurrent ␈ααtime ␈α↓step ␈α↓or ␈α↓for ␈α↓earlier ␈α↓time ␈α↓steps, ␈α↓(subject ␈α↓to ␈α↓practical
limits as to the number of time steps that are saved).

        The ␈α↓output, or outputs, from each table are similarly limited in range and specify, in all
cases, ␈αa ␈αprobability ␈αthat ␈αsome ␈αparticular ␈αsignificant ␈αfeature, ␈αphonette, ␈αphoneme, ␈αword
segment, word or phrase is present.

        We ␈ααare ␈ααlimiting ␈ααthe ␈ααrange ␈α↓of ␈α↓inputs ␈α↓and ␈α↓outputs ␈α↓to ␈α↓values ␈α↓specified ␈α↓by ␈α↓3 ␈α↓bits ␈α↓and ␈α↓the
number ␈αof ␈α
entries ␈α
per ␈α
table ␈α
to ␈α
64 ␈α
although ␈α
this ␈α
choice ␈α
of ␈α
values ␈α
is ␈α
a ␈α
matter ␈α
to ␈α
be
determined ␈α∂by ␈α∂experiment. ␈α∂ ␈α∂We ␈α∞are ␈α∞also ␈α∞providing ␈α∞for ␈α∞any ␈α∞of ␈α∞the ␈α∞following ␈α∞input
combinations, ␈α↓(1) ␈α↓one ␈α↓input ␈α↓of ␈α↓6 ␈α↓bits, ␈α↓(2) two inputs of 3 bits each, (3) three inputs of 2 bits
each, ␈ααand ␈αα(4) ␈ααsix ␈ααinputs ␈ααof ␈αα1 ␈ααbit ␈ααeach. ␈αα ␈ααThe ␈ααuses ␈ααto ␈ααwhich ␈ααthese ␈ααdifferint ␈α↓forms ␈α↓are ␈α↓put ␈α↓will
be described later.

        The ␈αβbody ␈αβof ␈αβeach ␈ααtable ␈ααcontains ␈ααentries ␈ααcorresponding ␈ααto ␈ααevery ␈ααpossible ␈ααcombination
of ␈ααthe ␈ααallowed ␈ααinput ␈ααparameters. ␈αα ␈ααEach ␈ααentry ␈ααin ␈ααthe ␈ααtable ␈ααactually ␈ααconsists ␈α↓of ␈α↓several ␈α↓parts.
There ␈ααare ␈ααfields ␈ααassigned ␈ααto ␈ααaccumulate ␈ααcounts ␈ααof ␈ααthe ␈ααoccurrances ␈ααof ␈ααincidents ␈ααin ␈ααwhich ␈ααthe
specifying ␈ααinput ␈ααvalues ␈ααcoincided ␈ααwith ␈ααthe ␈ααdifferent ␈ααdesired ␈ααoutputs ␈ααfrom ␈α↓the ␈α↓table ␈α↓as ␈α↓found
during ␈α∧previous ␈α∧learning ␈αβsessions ␈αβand ␈αβthere ␈αβare ␈αβfields ␈αβcontaining ␈αβthe ␈αβsummarized ␈αβresults ␈αβof
these ␈α¬learning ␈α∧sessions, ␈α∧which ␈α∧are ␈α∧used ␈α∧as ␈α∧outputs ␈α∧from ␈α∧the ␈α∧table. ␈α∧ ␈α∧The ␈α∧outputs ␈α∧from ␈α∧the
tables ␈αcan ␈αthen ␈αexpress ␈αto ␈αthe ␈αallowed ␈αaccuracy ␈αall ␈αpossible ␈αfunctions ␈αof ␈αthe ␈αinput
parameters.

Operation in the Training Mode

        When ␈απoperating ␈απin ␈απthe ␈απtraining ␈απmode ␈απthe ␈απprogram ␈απis ␈απsupplied ␈απwith ␈απa ␈αεsequence ␈αεof
stored ␈αβutterances ␈ααwith ␈ααaccompanying ␈ααphonetic ␈ααtranscriptions. ␈αα ␈ααEach ␈ααsegment ␈ααof ␈ααthe ␈ααincoming
speech ␈αλsignal ␈αλis ␈αλanalysed ␈απ(Fourier ␈απtransforms ␈απor ␈απinverse ␈απfilter ␈απequivalent) ␈απto ␈απobtain ␈απthe
necessary ␈αβinput ␈αβparmeters ␈αβfor ␈αβthe ␈αβlowest ␈αβlevel ␈αβtables ␈αβin ␈αβthe ␈αβsignature ␈ααtable ␈ααhierarchy. ␈αα ␈ααAt
the ␈ααsame ␈ααtime ␈ααreference ␈ααis ␈ααmade ␈ααto ␈ααa ␈ααtable ␈ααof ␈ααphonetic ␈αα"hints" ␈ααwhich ␈ααprescribe ␈α↓the ␈α↓desired
outputs ␈α¬from ␈α¬each ␈α¬table ␈α¬which ␈α¬correspond ␈α¬to ␈α¬all ␈α¬possible ␈α∧phonemic ␈α∧inputs. ␈α∧ ␈α∧The ␈α∧signature
tables are then processed.

        The ␈αβprocessing ␈ααof ␈ααeach ␈ααtable ␈ααis ␈ααdone ␈ααin ␈ααtwo ␈ααsteps, ␈ααone ␈ααprocess ␈ααat ␈ααeach ␈ααentry ␈ααto ␈ααthe
table ␈αβand ␈αβthe ␈ααsecond ␈ααonly ␈ααperiodically. ␈αα ␈ααThe ␈ααfirst ␈ααprocess ␈ααconsists ␈ααof ␈ααlocating ␈ααa ␈ααsingle ␈ααentry
line ␈αwithin ␈α
the ␈α
table ␈α
as ␈α
specified ␈α
by ␈α
the ␈α
inputs ␈α
to ␈α
the ␈α
table ␈α
and ␈α
adding ␈α
a ␈α
1 ␈α
to ␈α
the
appropriate ␈αfield ␈α
to ␈α
indicate ␈α
the ␈α
presence ␈α
of ␈α
the ␈α
property ␈α
specified ␈α
by ␈α
hint ␈α
table ␈α
as
corresponding ␈ααto ␈α↓the ␈α↓phoneme ␈α↓specified ␈α↓in ␈α↓the ␈α↓phonemic ␈α↓transcription. ␈α↓ ␈α↓At ␈α↓this ␈α↓time ␈α↓a ␈α↓report
is ␈αβalso ␈αβmade ␈αβas ␈αβto ␈αβthe ␈αβtable's ␈αβoutput ␈αβas ␈αβdetermined ␈αβfrom ␈ααthe ␈ααaveraged ␈ααresults ␈ααof ␈ααprevious
learning so that a running record may be kept of the performance of the system.  At periodic
intervals ␈α↓all ␈α↓tables ␈α↓are ␈α↓updated ␈α↓to ␈α↓incorporate recent learning results.  To make this process
easily ␈αλunderstandable, ␈αλlet ␈αλus ␈αλrestrict ␈αλour ␈αλattention ␈αλto ␈αλa ␈αλtable ␈αλused ␈αλto ␈απidentify ␈απa ␈απsingle
significant ␈αεfeature ␈αεsay ␈αεVoicing. ␈αε ␈αεThe ␈αεhint ␈αεtable ␈αεwill ␈αεidentify ␈αεwhether ␈αεor ␈αεnot ␈αεthe ␈α¬phoneme
currently ␈α∧being ␈α∧processed ␈α∧is ␈α∧to ␈α∧be ␈α∧considered ␈α∧voiced. ␈α∧ ␈αβIf ␈αβit ␈αβis ␈αβvoiced, ␈αβa ␈αβ1 ␈αβis ␈αβadded ␈αβto ␈αβthe
"yes" ␈α↓field ␈α↓of the entry line located by the normal inputs to the table.  If it is not voiced, a 1
is ␈α¬added ␈α¬to ␈α¬the ␈α¬"no" ␈α¬field. ␈α¬ ␈α¬At ␈α¬updating ␈α¬time ␈α∧the ␈α∧output ␈α∧that ␈α∧this ␈α∧entry ␈α∧will ␈α∧subsequently
report ␈αβis ␈αβdetermined ␈αβby ␈αβdividing ␈αβthe ␈αβaccumulated ␈αβsum ␈αβin ␈αβthe ␈αβ"yes" ␈αβfield ␈αβby ␈αβthe ␈αβsum ␈αβof ␈αβthe
numbers ␈αεin ␈αεthe ␈αε"yes" ␈αεand ␈αεthe ␈α¬"no" ␈α¬fields, ␈α¬and ␈α¬reporting ␈α¬this ␈α¬quantity ␈α¬as ␈α¬a ␈α¬number ␈α¬in ␈α¬the
range ␈α∧from ␈α∧0 ␈α∧to ␈α∧7. ␈α∧ ␈α∧Actually ␈α∧the ␈α∧process ␈α∧is ␈αβa ␈αβbit ␈αβmore ␈αβcomplicated ␈αβthan ␈αβthis ␈αβand ␈αβit ␈αβvaries
with ␈α	the ␈α	exact ␈α	type ␈αλof ␈αλtable ␈αλunder ␈αλconsideration, ␈αλas ␈αλreported ␈αλin ␈αλdetail ␈αλin ␈αλappendix ␈αλB.
Outputs ␈απfrom ␈απthe ␈απsignature ␈απtables ␈απare ␈απnot ␈απprobabilities, ␈απin ␈απthe ␈αεstrict ␈αεsense, ␈αεbut ␈αεare ␈αεthe
statistically-arrived-at odds based on the actual learning sequence.

        The ␈αβpreamble ␈αβof ␈αβthe ␈αβtable ␈ααhas ␈ααspace ␈ααfor ␈ααstoring ␈ααtwelve ␈ααpast ␈ααoutputs. ␈αα ␈ααAn ␈ααinput ␈ααto ␈ααa
table ␈αβcan ␈αβbe ␈αβdelayed ␈αβto ␈αβthat ␈αβextent. ␈αβ ␈αβThis ␈αβtable ␈αβrelates ␈αβoutcomes ␈αβof ␈αβprevious ␈αβevents ␈ααwith
the ␈ααpresent ␈ααhint-the ␈ααlearning ␈ααinput. ␈αα ␈ααA ␈ααcertain ␈ααamount ␈ααof ␈α↓context ␈α↓dependent ␈α↓learning ␈α↓is ␈α↓thus
possible with the limitation that the specified delays are constant.

        The ␈α∧interconnected ␈α∧hierarchy ␈α∧of ␈α∧tables ␈α∧form ␈αβa ␈αβnetwork ␈αβwhich ␈αβruns ␈αβincreamentally, ␈αβin
steps ␈α∧synchronous ␈α∧with ␈α∧time ␈α∧window ␈α∧over ␈α∧which ␈α∧the ␈α∧input ␈α∧signal ␈αβis ␈αβanalised. ␈αβ ␈αβThe ␈αβpresent
window ␈αβwidth ␈αβis ␈αβset ␈αβat ␈αβ12.8 ␈αβms.(256 ␈αβpoints ␈αβat ␈αβ20 ␈αβK ␈αβsamples/sec.) ␈αβwith ␈αβoverlap ␈αβof ␈αα6.4 ␈ααms.
Inputs ␈α¬to ␈α¬this ␈α¬network ␈α¬are ␈α¬the ␈α∧parameters ␈α∧abstracted ␈α∧from ␈α∧the ␈α∧frequency ␈α∧analyses ␈α∧of ␈α∧the
signal, ␈α∧and ␈α∧the ␈α∧specified ␈α∧hint. ␈α∧ ␈α∧The ␈α∧outputs ␈α∧of ␈α∧the ␈α∧network ␈α∧could ␈αβbe ␈αβeither ␈αβthe ␈αβprobability
attached ␈αβto ␈αβevery ␈αβphonetic ␈αβsymbol ␈αβor ␈αβthe ␈αβoutput ␈αβof ␈ααa ␈ααtable ␈ααassociated ␈ααwith ␈ααa ␈ααfeature ␈ααsuch
as ␈αβvoiced, ␈αβvowel ␈αβetc. ␈αβ ␈αβThe ␈αβpoint ␈ααto ␈ααbe ␈ααmade ␈ααis ␈ααthat ␈ααthe ␈ααoutput ␈ααgenerated ␈ααfor ␈ααa ␈ααsegment ␈ααis
essentially ␈αλindependent ␈αλof ␈αλits ␈αλcontiguous ␈αλsegments. ␈αλ ␈αλThe ␈απdependency ␈απachieved ␈απby ␈απusing
delayes ␈ααin ␈ααthe ␈ααinputs ␈ααis ␈α↓invisible ␈α↓to ␈α↓the ␈α↓outputs. ␈α↓ ␈α↓The ␈α↓outputs ␈α↓thus ␈α↓report ␈α↓the ␈α↓best ␈α↓estimate
on ␈αεwhat ␈αεthe ␈αεcurrent ␈αεacoustic ␈αεinput ␈α¬is ␈α¬with ␈α¬no ␈α¬relation ␈α¬to ␈α¬the ␈α¬past ␈α¬outputs. ␈α¬ ␈α¬Relating ␈α¬the
successive outputs along the time dimension is realised by counters.

The Use of COUNTERS

        The ␈ααtransition ␈ααfrom ␈ααinitial ␈α↓segment ␈α↓space ␈α↓to ␈α↓event ␈α↓space ␈α↓is ␈α↓made ␈α↓posible ␈α↓by ␈α↓means ␈α↓of
COUNTERS ␈ααwhich ␈ααare ␈ααsummed ␈ααand ␈ααreiniated ␈ααwhenever ␈ααtheir ␈ααinputs ␈α↓cross ␈α↓specified ␈α↓threshold
values, ␈α↓being ␈α↓triggered ␈α↓on ␈α↓when ␈α↓the ␈α↓input ␈α↓exceeds ␈α↓the threshold and off when it falls below.
Momentary ␈α∧spikes ␈α∧are ␈α∧eliminated ␈α∧by ␈α∧specifying ␈α∧time ␈α∧hysteresis, ␈αβthe ␈αβnumber ␈αβof ␈αβconsecutive
segments ␈α	for ␈α	which ␈α	the ␈α	input ␈α	must ␈α	be ␈αλabove ␈αλthe ␈αλthreshold. ␈αλ ␈αλThe ␈αλoutput ␈αλof ␈αλa ␈αλcounter
provides ␈α∧information ␈α∧about ␈α∧starting ␈α∧time, ␈α∧duration ␈α∧and ␈α∧average ␈α∧input ␈α∧for ␈α∧the ␈α∧period ␈α∧it ␈α∧was
active.

        Since ␈α∧a ␈α∧counter ␈αβcan ␈αβreference ␈αβa ␈αβtable ␈αβat ␈αβany ␈αβlevel ␈αβin ␈αβthe ␈αβhierarchy ␈αβof ␈αβtables, ␈αβit ␈αβcan
reflect ␈ααany ␈ααdesired ␈ααdegree ␈ααof ␈ααinformation ␈ααreduction. ␈αα ␈ααFor ␈ααexample, ␈ααa ␈α↓counter ␈α↓may ␈α↓be ␈α↓set ␈α↓up
to ␈αβshow ␈αβa ␈αβsection ␈αβof ␈αβspeech ␈αβto ␈αβbe ␈αβa ␈ααvowel, ␈ααa ␈ααfront ␈ααvowel ␈ααor ␈ααthe ␈ααvowel ␈αα/I/. ␈αα ␈ααThe ␈ααcounters
can ␈α¬be ␈α¬looked ␈α¬upon ␈α¬to ␈α¬represent ␈α¬a ␈α¬mapping ␈α¬of ␈α¬parameter-time ␈α¬space ␈α¬into ␈α¬a ␈α∧feature-time
space, ␈αβor ␈αβat ␈αβa ␈αβhigher ␈ααlevel ␈ααsymbol-time ␈ααspace. ␈αα ␈ααIt ␈ααmay ␈ααbe ␈ααuseful ␈ααto ␈ααcarry ␈ααalong ␈ααthe ␈ααfeature
information ␈α↓as ␈α↓a ␈α↓back ␈α↓up in those situations where the symbolic information is not acceptable
to syntactic or semantic interpretation.

        In ␈α∧the ␈α∧same ␈α∧manner ␈α∧as ␈α∧the ␈α∧tables, ␈α∧the ␈α∧counters ␈α∧run ␈αβcompletely ␈αβindependent ␈αβof ␈αβeach
other.  In a recognition run the counters may overlap in arbitrary fashion, may leave out gaps
where ␈αεno ␈αεcounter ␈αεhas ␈αεbeen ␈αεtriggered ␈αεor ␈αεmay ␈αεnot ␈αεline ␈αεup ␈αεnicely. ␈αε ␈αεA ␈αεproperly ␈αεsegmented
output, ␈α	where ␈α	the ␈α	consecutive ␈α	sections ␈αλare ␈αλin ␈αλtime ␈αλsequence ␈αλand ␈αλare ␈αλneatly ␈αλlabled, ␈αλis
essential ␈α¬for ␈α¬processing ␈α¬it ␈α¬further. ␈α¬ ␈α¬This ␈α¬is ␈α¬achieved ␈α¬by ␈α∧registering ␈α∧the ␈α∧instants ␈α∧when ␈α∧the
counters are triggered or terminated to form time segments called events.

        An ␈αβevent ␈αβis ␈αβthe ␈αβperiod ␈αβbetween ␈αβsuccessive ␈αβactivation ␈αβor ␈αβtermination ␈ααof ␈ααany ␈ααcounter.
An ␈αβevent ␈αβshorter ␈ααthan ␈ααa ␈ααspecified ␈ααtime ␈ααis ␈ααmerely ␈ααignored. ␈αα ␈ααA ␈ααrecord ␈ααof ␈ααevent ␈ααdurations ␈ααand
upto three active counters, ordered according to their probability, is maintained.

        An ␈α↓event ␈α↓resulting ␈α↓from ␈α↓the ␈α↓processing ␈α↓described ␈α↓so ␈α↓far, ␈α↓represents ␈α↓a ␈α↓phonette - one
of ␈α	the ␈α	basic ␈αλspeech ␈αλcategories ␈αλdefined ␈αλas ␈αλhints ␈αλin ␈αλthe ␈αλlearning ␈αλprocess. ␈αλ ␈αλIt ␈αλis ␈αλonly ␈αλan
estimate ␈α↓of ␈α↓closeness to a speech category , based on past learning.  Also each category has
a ␈αβmore-or-less ␈αβstationary ␈αβspectral ␈αβcharacterisation. ␈αβ ␈αβThus ␈αβa ␈αβcategory ␈αβmay ␈ααhave ␈ααa ␈ααphonemic
equivalent ␈ααas ␈ααin ␈ααthe ␈ααcase ␈ααof ␈ααvowels ␈αα, ␈ααit ␈α↓may ␈α↓be ␈α↓common ␈α↓to ␈α↓phoneme ␈α↓class ␈α↓as ␈α↓for ␈α↓the ␈α↓voiced
or ␈α↓unvoiced ␈α↓stop ␈α↓gaps ␈α↓or it may be subphonemic as a T-burst or a K-burst.  The choices are
based ␈αεon ␈αεacoustic ␈αεexpediency, ␈αεi.e. ␈αε ␈αεoptimisation ␈αεof ␈αεthe ␈αεlearning ␈αεrather ␈αεthan ␈αεany ␈α¬linguistic
considerations. ␈α∧ ␈α∧However ␈α∧a ␈α∧higher ␈α∧level ␈α∧interpretive ␈αβprograms ␈αβmay ␈αβbest ␈αβoperate ␈αβon ␈αβinputs
resembling ␈α¬phonemic ␈α¬trancription. ␈α¬ ␈α¬The ␈α¬contiguous ␈α¬events ␈α¬may ␈α¬be ␈α¬coalesced ␈α∧into ␈α∧phoneme
like ␈α¬units ␈α¬using ␈α¬diadic ␈α¬or ␈α¬triadic ␈α¬probabilities ␈α¬and ␈α¬acoustic-phonetic ␈α¬rules ␈α∧particular ␈α∧to ␈α∧the
system. ␈α↓ ␈α↓For ␈α↓example, ␈α↓a ␈α↓period of silence followed by a type of burst or a short friction may
be ␈α↓combined ␈α↓to ␈α↓form ␈α↓the ␈α↓corresponding ␈α↓stop. ␈α↓ ␈α↓A ␈α↓short ␈α↓friction ␈α↓or ␈α↓a ␈α↓burst following a nasal or
a ␈αβlateral ␈αβmay ␈αβbe ␈αβcalled ␈αβa ␈αβstop ␈αβeven ␈αβif ␈ααthe ␈ααsilence ␈ααperiod ␈ααis ␈ααshort ␈ααor ␈ααabsent. ␈αα ␈ααClearly ␈ααthese
rules ␈αεmust ␈αεbe ␈αεspecific ␈αεto ␈αεthe ␈αεsystem, ␈αεbased ␈αεon ␈αεthe ␈αεconfidence ␈αεwith ␈αεwhich ␈αεdurations ␈α¬and
phonette categories are recognised.